AMANUE.NSI[X,ALS] - www.SailDart.org

perm filename AMANUE.NSI[X,ALS] blob sn#087633 filedate 1974-02-20 generic text, type T, neo UTF8
00100			The Amanuensis Speech Recognition System
00150	
00200					by
00300				James L.Hieronymus
00400				 Neil J. Miller
00500				 Arthur L. Samuel
00600	
00700		 Stanford A.I. Laboratory, Stanford University
00800	
00900				 Abstract
01000	The Amanuensis speech recognition system under development at the
01100	Stanford A.I. Laboratory is a front end system that attempts to
01200	extract the maximum amount of linguistic information from the
01300	acoustic speech signal and that uses machine learning techniques. It
01400	differs from the system previously reported in a number of important
01500	respects:
01505		1) Parameters for all voiced regions are determined pitch
01506	synchronously.
01600		2) A new acoustic segmenter is used to extract certain
01700	features directly from the acoustic input and to isolate regions for
01800	special treatment.
02100		3) A new formant extractor is used which obviates the need
02200	for tracking and which can be used with FFT data thus preserving band
02300	width and formant shape information.
02400		4) Use is made of informatiion from both the steady or nearly
02500	regions and transition regions.
02600		5) Speaker normalization is to be done partly by formula and
02700	partly by learning, with a bootstrapping technique proposed to adapt
02800	the system to different speakers.
02900		6) Greater use is made of redundancy of speech to improve the
03000	recognition.
03100		7) An improved form of signature table has been developed for
03200	use by the learning routines, yielding better accuracy and a better
03300	compromise between the need for unnecessaryly large amounts of
03400	training material and the need for smoothing.
03500		8) Several alternate output streams of phonemes are produced
03600	with probability ratings for both the complete streams and for the
03700	individual phonemes, so that it should not be necessary ever to go
03800	back to the original acoustic data to resolve ambiguities and to
03900	incorporate syntactic, semantic and contextual information in the
04000	decision process.